2018-04-29

You will learn in this session

  • how to handle temporal and spatial autocorrelation
  • that GLMM are not very difficult if you already know GLM and LMM
  • that random effects as well can have non Gaussian distribution
  • that there are even more general methods than GLMM out there: HGLM and DHGLM
  • how to handle heteroscedasticity

Temporal autocorrelation

Temporal autocorrelation in discrete (equaly spaced) time steps

The AirPassengers data

AirPassengers
##      Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1949 112 118 132 129 121 135 148 148 136 119 104 118
## 1950 115 126 141 135 125 149 170 170 158 133 114 140
## 1951 145 150 178 163 172 178 199 199 184 162 146 166
## 1952 171 180 193 181 183 218 230 242 209 191 172 194
## 1953 196 196 236 235 229 243 264 272 237 211 180 201
## 1954 204 188 235 227 234 264 302 293 259 229 203 229
## 1955 242 233 267 269 270 315 364 347 312 274 237 278
## 1956 284 277 317 313 318 374 413 405 355 306 271 306
## 1957 315 301 356 348 355 422 465 467 404 347 305 336
## 1958 340 318 362 348 363 435 491 505 404 359 310 337
## 1959 360 342 406 396 420 472 548 559 463 407 362 405
## 1960 417 391 419 461 472 535 622 606 508 461 390 432

The AirPassengers data

plot(AirPassengers)

Reformating the dataset for the fit

air <- data.frame(passengers = as.numeric(AirPassengers),
                  year = rep(1949:1960, each = 12),
                  month = factor(rep(1:12, 12)))
air
##     passengers year month
## 1          112 1949     1
## 2          118 1949     2
## 3          132 1949     3
## 4          129 1949     4
## 5          121 1949     5
## 6          135 1949     6
## 7          148 1949     7
## 8          148 1949     8
## 9          136 1949     9
## 10         119 1949    10
## 11         104 1949    11
## 12         118 1949    12
## 13         115 1950     1
## 14         126 1950     2
## 15         141 1950     3
## 16         135 1950     4
## 17         125 1950     5
## 18         149 1950     6
## 19         170 1950     7
## 20         170 1950     8
## 21         158 1950     9
## 22         133 1950    10
## 23         114 1950    11
## 24         140 1950    12
## 25         145 1951     1
## 26         150 1951     2
## 27         178 1951     3
## 28         163 1951     4
## 29         172 1951     5
## 30         178 1951     6
## 31         199 1951     7
## 32         199 1951     8
## 33         184 1951     9
## 34         162 1951    10
## 35         146 1951    11
## 36         166 1951    12
## 37         171 1952     1
## 38         180 1952     2
## 39         193 1952     3
## 40         181 1952     4
## 41         183 1952     5
## 42         218 1952     6
## 43         230 1952     7
## 44         242 1952     8
## 45         209 1952     9
## 46         191 1952    10
## 47         172 1952    11
## 48         194 1952    12
## 49         196 1953     1
## 50         196 1953     2
## 51         236 1953     3
## 52         235 1953     4
## 53         229 1953     5
## 54         243 1953     6
## 55         264 1953     7
## 56         272 1953     8
## 57         237 1953     9
## 58         211 1953    10
## 59         180 1953    11
## 60         201 1953    12
## 61         204 1954     1
## 62         188 1954     2
## 63         235 1954     3
## 64         227 1954     4
## 65         234 1954     5
## 66         264 1954     6
## 67         302 1954     7
## 68         293 1954     8
## 69         259 1954     9
## 70         229 1954    10
## 71         203 1954    11
## 72         229 1954    12
## 73         242 1955     1
## 74         233 1955     2
## 75         267 1955     3
## 76         269 1955     4
## 77         270 1955     5
## 78         315 1955     6
## 79         364 1955     7
## 80         347 1955     8
## 81         312 1955     9
## 82         274 1955    10
## 83         237 1955    11
## 84         278 1955    12
## 85         284 1956     1
## 86         277 1956     2
## 87         317 1956     3
## 88         313 1956     4
## 89         318 1956     5
## 90         374 1956     6
## 91         413 1956     7
## 92         405 1956     8
## 93         355 1956     9
## 94         306 1956    10
## 95         271 1956    11
## 96         306 1956    12
## 97         315 1957     1
## 98         301 1957     2
## 99         356 1957     3
## 100        348 1957     4
## 101        355 1957     5
## 102        422 1957     6
## 103        465 1957     7
## 104        467 1957     8
## 105        404 1957     9
## 106        347 1957    10
## 107        305 1957    11
## 108        336 1957    12
## 109        340 1958     1
## 110        318 1958     2
## 111        362 1958     3
## 112        348 1958     4
## 113        363 1958     5
## 114        435 1958     6
## 115        491 1958     7
## 116        505 1958     8
## 117        404 1958     9
## 118        359 1958    10
## 119        310 1958    11
## 120        337 1958    12
## 121        360 1959     1
## 122        342 1959     2
## 123        406 1959     3
## 124        396 1959     4
## 125        420 1959     5
## 126        472 1959     6
## 127        548 1959     7
## 128        559 1959     8
## 129        463 1959     9
## 130        407 1959    10
## 131        362 1959    11
## 132        405 1959    12
## 133        417 1960     1
## 134        391 1960     2
## 135        419 1960     3
## 136        461 1960     4
## 137        472 1960     5
## 138        535 1960     6
## 139        622 1960     7
## 140        606 1960     8
## 141        508 1960     9
## 142        461 1960    10
## 143        390 1960    11
## 144        432 1960    12

Looking at the average trend per year

plot(with(air, tapply(passengers, year, mean)) ~ I(1949:1960),
     ylab = "Mean number of passengers", xlab = "Year", type = "b")

Looking at the average trend per month

plot(with(air, tapply(passengers, month, mean)) ~ I(1:12),
     ylab = "Mean number of passengers", xlab = "Month", type = "b")

Simple fit

(mod_air <- lm(passengers ~ year * month, data = air))
## 
## Call:
## lm(formula = passengers ~ year * month, data = air)
## 
## Coefficients:
##  (Intercept)          year        month2        month3        month4        month5        month6        month7  
##   -5.407e+04     2.779e+01     6.356e+03     2.129e+02    -3.118e+03    -7.275e+03    -1.786e+04    -2.994e+04  
##       month8        month9       month10       month11       month12   year:month2   year:month3   year:month4  
##   -2.936e+04    -1.232e+04    -5.237e+03     3.060e+03    -1.094e+03    -3.255e+00    -9.441e-02     1.608e+00  
##  year:month5   year:month6   year:month7   year:month8   year:month9  year:month10  year:month11  year:month12  
##    3.738e+00     9.171e+00     1.537e+01     1.508e+01     6.336e+00     2.692e+00    -1.570e+00     5.699e-01

The problem

plot(residuals(mod_air), type = "b")
abline(h = 0, lty = 2, col = "red")

The problem

lmtest::dwtest(mod_air)
## 
##  Durbin-Watson test
## 
## data:  mod_air
## DW = 0.35903, p-value < 2.2e-16
## alternative hypothesis: true autocorrelation is greater than 0

The problem

acf(residuals(mod_air))

Solution

library(nlme)
MAR1 <- corAR1(value = 0.5, form = ~ 1|year, fixed = FALSE)
MAR1 <- Initialize(MAR1, data = air)
round(corMatrix(MAR1)[["1950"]], 2)
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
##  [1,] 1.00 0.50 0.25 0.13 0.06 0.03 0.02 0.01 0.00  0.00  0.00  0.00
##  [2,] 0.50 1.00 0.50 0.25 0.13 0.06 0.03 0.02 0.01  0.00  0.00  0.00
##  [3,] 0.25 0.50 1.00 0.50 0.25 0.13 0.06 0.03 0.02  0.01  0.00  0.00
##  [4,] 0.13 0.25 0.50 1.00 0.50 0.25 0.13 0.06 0.03  0.02  0.01  0.00
##  [5,] 0.06 0.13 0.25 0.50 1.00 0.50 0.25 0.13 0.06  0.03  0.02  0.01
##  [6,] 0.03 0.06 0.13 0.25 0.50 1.00 0.50 0.25 0.13  0.06  0.03  0.02
##  [7,] 0.02 0.03 0.06 0.13 0.25 0.50 1.00 0.50 0.25  0.13  0.06  0.03
##  [8,] 0.01 0.02 0.03 0.06 0.13 0.25 0.50 1.00 0.50  0.25  0.13  0.06
##  [9,] 0.00 0.01 0.02 0.03 0.06 0.13 0.25 0.50 1.00  0.50  0.25  0.13
## [10,] 0.00 0.00 0.01 0.02 0.03 0.06 0.13 0.25 0.50  1.00  0.50  0.25
## [11,] 0.00 0.00 0.00 0.01 0.02 0.03 0.06 0.13 0.25  0.50  1.00  0.50
## [12,] 0.00 0.00 0.00 0.00 0.01 0.02 0.03 0.06 0.13  0.25  0.50  1.00

Solution

(mod_air2 <- lme(passengers ~ month * year, random = ~ 1 | year, data = air,
                 correlation = MAR1, method = "REML"))
## Linear mixed-effects model fit by REML
##   Data: air 
##   Log-restricted-likelihood: -477.871
##   Fixed: passengers ~ month * year 
##   (Intercept)        month2        month3        month4        month5        month6        month7        month8 
## -5.406738e+04  6.355626e+03  2.129324e+02 -3.118268e+03 -7.275373e+03 -1.785545e+04 -2.993915e+04 -2.935851e+04 
##        month9       month10       month11       month12          year   month2:year   month3:year   month4:year 
## -1.232239e+04 -5.237282e+03  3.059512e+03 -1.093845e+03  2.778671e+01 -3.255245e+00 -9.440559e-02  1.608392e+00 
##   month5:year   month6:year   month7:year   month8:year   month9:year  month10:year  month11:year  month12:year 
##  3.737762e+00  9.171329e+00  1.537413e+01  1.507692e+01  6.335664e+00  2.692308e+00 -1.569930e+00  5.699301e-01 
## 
## Random effects:
##  Formula: ~1 | year
##         (Intercept) Residual
## StdDev:     13.4979  8.34667
## 
## Correlation Structure: AR(1)
##  Formula: ~1 | year 
##  Parameter estimate(s):
##       Phi 
## 0.3270032 
## Number of Observations: 144
## Number of Groups: 12

Alternative code

(mod_air2b <- lme(passengers ~ month * year, random = ~ 1 | year, data = air,
                 correlation = corAR1(form = ~ 1|year), method = "REML"))
## Linear mixed-effects model fit by REML
##   Data: air 
##   Log-restricted-likelihood: -477.871
##   Fixed: passengers ~ month * year 
##   (Intercept)        month2        month3        month4        month5        month6        month7        month8 
## -5.406738e+04  6.355626e+03  2.129324e+02 -3.118268e+03 -7.275373e+03 -1.785545e+04 -2.993915e+04 -2.935851e+04 
##        month9       month10       month11       month12          year   month2:year   month3:year   month4:year 
## -1.232239e+04 -5.237282e+03  3.059512e+03 -1.093845e+03  2.778671e+01 -3.255245e+00 -9.440559e-02  1.608392e+00 
##   month5:year   month6:year   month7:year   month8:year   month9:year  month10:year  month11:year  month12:year 
##  3.737762e+00  9.171329e+00  1.537413e+01  1.507692e+01  6.335664e+00  2.692308e+00 -1.569930e+00  5.699301e-01 
## 
## Random effects:
##  Formula: ~1 | year
##         (Intercept) Residual
## StdDev:     13.4979 8.346669
## 
## Correlation Structure: AR(1)
##  Formula: ~1 | year 
##  Parameter estimate(s):
##       Phi 
## 0.3270031 
## Number of Observations: 144
## Number of Groups: 12

Testing the temporal autocorrelation

mod_air3 <- lme(passengers ~ month * year, random = ~ 1 | year, data = air, method = "REML")
anova(mod_air2, mod_air3)
##          Model df      AIC      BIC    logLik   Test  L.Ratio p-value
## mod_air2     1 27 1009.742 1085.004 -477.8710                        
## mod_air3     2 26 1017.362 1089.837 -482.6811 1 vs 2 9.620272  0.0019

Alternative autocorrelation structures

mod_airARMA1 <- update(mod_air2, correlation = corARMA(form = ~ 1 | year, p = 1, q = 0))
mod_airARMA2 <- update(mod_air2, correlation = corARMA(form = ~ 1 | year, p = 4, q = 0))
mod_airARMA3 <- update(mod_air2, correlation = corARMA(form = ~ 1 | year, p = 2, q = 2))

rbind(mod_air2 = AIC(mod_air2),
      mod_airARMA1 = AIC(mod_airARMA1),
      mod_airARMA2 = AIC(mod_airARMA2),
      mod_airARMA3 = AIC(mod_airARMA3))
##                  [,1]
## mod_air2     1009.742
## mod_airARMA1 1009.742
## mod_airARMA2 1008.911
## mod_airARMA3 1011.174


Note: do not compare AICs or likelihoods from nlme to those from other packages!

(it seems they have failed to consider a constant term…)

Fitted values

mod_air4 <- update(mod_air2, correlation = corARMA(form = ~ 1 | year, p = 4, q = 0), method = "ML")
data.for.plot <- expand.grid(month = factor(1:12), year = 1949:1960)
data.for.plot$obs <- air$passengers
data.for.plot$time <- seq(1949, 1960, length = (1960 - 1949 + 1) * 12)
data.for.plot$fit_lm <- predict(mod_air)
data.for.plot$fit_lme <- predict(mod_air4)

Fitted values

plot(obs ~ time, data = data.for.plot, type = "l", ylim = c(0, 700), ylab = "Passengers")
points(fit_lm ~ time, data = data.for.plot, type = "l", col = "red")
points(fit_lme ~ time, data = data.for.plot, type = "l", col = "blue")

Better, but good enough?

plot(residuals(mod_air4), type = "l")
abline(h = 0, lty = 2, col = "red")

Temporal autocorrelation in continuous time

The nlme::BodyWeight dataset

data("BodyWeight", package = "nlme")
plot(BodyWeight)

The nlme::BodyWeight dataset

body <- as.data.frame(BodyWeight)
body$Rat <- factor(body$Rat, levels = 1:16, order = FALSE)
str(body)
## 'data.frame':    176 obs. of  4 variables:
##  $ weight: num  240 250 255 260 262 258 266 266 265 272 ...
##  $ Time  : num  1 8 15 22 29 36 43 44 50 57 ...
##  $ Rat   : Factor w/ 16 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Diet  : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
unique(body$Time)
##  [1]  1  8 15 22 29 36 43 44 50 57 64

Fitting the model

(mod_rat1 <- lme(weight ~ Diet * Time, random = ~ Time|Rat, data = body))
## Linear mixed-effects model fit by REML
##   Data: body 
##   Log-restricted-likelihood: -575.8599
##   Fixed: weight ~ Diet * Time 
## (Intercept)       Diet2       Diet3        Time  Diet2:Time  Diet3:Time 
## 251.6516516 200.6654865 252.0716778   0.3596391   0.6058392   0.2983375 
## 
## Random effects:
##  Formula: ~Time | Rat
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev     Corr  
## (Intercept) 36.9390723 (Intr)
## Time         0.2484113 -0.149
## Residual     4.4436052       
## 
## Number of Observations: 176
## Number of Groups: 16

Checking residuals

plot(mod_rat1) ## there is some homoscedasticity but we will ignore it for now

Checking residuals

plot(residuals(mod_rat1), type = "b")

Fitting continuous temporal autocorrelation

(mod_rat2 <- lme(weight ~ Diet * Time, random = ~ Time|Rat, correlation = corExp(form = ~ Time), data = body))
## Linear mixed-effects model fit by REML
##   Data: body 
##   Log-restricted-likelihood: -566.0296
##   Fixed: weight ~ Diet * Time 
## (Intercept)       Diet2       Diet3        Time  Diet2:Time  Diet3:Time 
## 251.5924463 200.6889077 252.3152061   0.3602961   0.6255177   0.3109452 
## 
## Random effects:
##  Formula: ~Time | Rat
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev     Corr  
## (Intercept) 36.9744887 (Intr)
## Time         0.2434607 -0.149
## Residual     4.5501283       
## 
## Correlation Structure: Exponential spatial correlation
##  Formula: ~Time | Rat 
##  Parameter estimate(s):
##    range 
## 3.650519 
## Number of Observations: 176
## Number of Groups: 16

Model comparison using nlme

anova(mod_rat1, mod_rat2)
##          Model df      AIC      BIC    logLik   Test  L.Ratio p-value
## mod_rat1     1 10 1171.720 1203.078 -575.8599                        
## mod_rat2     2 11 1154.059 1188.553 -566.0296 1 vs 2 19.66057  <.0001


Note: the comparison makes sense as models are nested and fitted with REML.

Same fit with spaMM

(mod_rat_spaMM <- fitme(weight ~ Diet * Time + (Time|Rat) + AR1(1|Time),
                           data = body, method = "REML"))
## formula: weight ~ Diet * Time + (Time | Rat) + AR1(1 | Time)
## REML: Estimation of lambda, phi, corrPars and ranCoefs by REML.
##       Estimation of fixed effects by ML.
## Family: gaussian ( link = identity ) 
##  ------------ Fixed effects (beta) ------------
##             Estimate Cond. SE t-value
## (Intercept) 251.6434 13.15459  19.130
## Diet2       200.6655 22.67950   8.848
## Diet3       252.0717 22.67950  11.115
## Time          0.3682  0.09668   3.808
## Diet2:Time    0.6058  0.15786   3.838
## Diet3:Time    0.2983  0.15786   1.890
##  --------------- Random effects ---------------
## Family: gaussian ( link = identity ) 
##                    --- Correlation parameters:
##   2.ARphi 
## 0.8440847 
##          --- Random-coefficients Cov matrices:
##  Group        Term   Var.   Corr.
##    Rat (Intercept)   1366        
##    Rat        Time 0.0624 -0.1507
##            --- Variance parameters ('lambda'):
## lambda = var(u) for u ~ Gaussian; 
##    Time  :  2.844  
##              --- Coefficients for log(lambda):
##  Group        Term Estimate Cond.SE
##   Time (Intercept)    1.045  0.5868
## # of obs: 176; # of groups: Rat, 32; Time, 11 
##  ------------- Residual variance  -------------
## Coefficients for log(phi) ~ 1 :
##             Estimate Cond. SE
## (Intercept)    2.826     0.12
## Estimate of phi=residual var:  16.88 
##  ------------- Likelihood values  -------------
##                         logLik
## p_v(h) (marginal L): -577.0637
##   p_beta,v(h) (ReL): -569.5571

Fitted values: nlme vs spaMM

plot(predict(mod_rat_spaMM), predict(mod_rat2))
abline(0, 1, col = "red")

Model comparison

mod_rat_spaMM2 <- fitme(weight ~ Diet * Time + (Time|Rat), data = body,
                        method = "REML")

print(AIC(mod_rat_spaMM))
##        marginal AIC:     conditional AIC:      dispersion AIC:        effective df: 
##            1174.1275            1035.8757            1147.1142             139.0042
print(AIC(mod_rat_spaMM2))
##        marginal AIC:     conditional AIC:      dispersion AIC:        effective df: 
##            1180.5026            1057.5118            1153.7197             144.9506

Testing the overall effect of diet

spaMM

mod_rat_spaMM3ML <- fitme(weight ~ Diet * Time + (Time|Rat) + AR1(1|Time), data = body,
                            method = "ML")
mod_rat_no_diet <- fitme(weight ~ 1 + Time + (Time|Rat) + AR1(1|Time), data = body,
                            method = "ML")
anova(mod_rat_spaMM3ML, mod_rat_no_diet)
##      chi2_LR df      p_value
## p_v 47.77998  4 1.048901e-09
c(logLik(mod_rat_spaMM3ML), logLik(mod_rat_no_diet))
##       p_v       p_v 
## -576.7546 -600.6446

Testing the overall effect of diet

nlme

mod_rat3ML <- lme(weight ~ Diet * Time, random = ~ Time|Rat,
                  correlation = corExp(form = ~ Time, nugget = TRUE), data = body, method = "ML")

mod_rat_no_diet2 <- lme(weight ~ 1 + Time, random = ~ Time|Rat,
                        correlation = corExp(form = ~ Time, nugget = TRUE), data = body, method = "ML")
anova(mod_rat3ML, mod_rat_no_diet2)
##                  Model df      AIC      BIC    logLik   Test  L.Ratio p-value
## mod_rat3ML           1 12 1165.962 1204.007 -570.9808                        
## mod_rat_no_diet2     2  8 1206.618 1231.982 -595.3088 1 vs 2 48.65601  <.0001

Revisiting the airplane passengers with spaMM::fitme()

air$time <- 1:nrow(air)
air$year_z <- scale(air$year) ## otherwise hard to fit!
mod_air_spaMM1 <- fitme(passengers ~ month * year_z + AR1(1|time), data = air, method = "REML")
mod_air_spaMM2 <- fitme(passengers ~ month * year_z, data = air, method = "REML")

print(AIC(mod_air_spaMM1))
##        marginal AIC:     conditional AIC:      dispersion AIC:        effective df: 
##           1062.72936            986.55102            915.08419             75.90128
print(AIC(mod_air_spaMM2))
##        marginal AIC: 
##             1233.527

Examining the best model

mod_air_spaMM1
## formula: passengers ~ month * year_z + AR1(1 | time)
## REML: Estimation of lambda, phi and corrPars by REML.
##       Estimation of fixed effects by ML.
## Family: gaussian ( link = identity ) 
##  ------------ Fixed effects (beta) ------------
##                Estimate Cond. SE t-value
## (Intercept)     264.112  496.689  0.5317
## month2           -6.750    2.720 -2.4816
## month3           28.417    3.022  9.4047
## month4           25.333    3.295  7.6874
## month5           30.083    3.548  8.4787
## month6           69.917    3.784 18.4777
## month7          109.583    4.006 27.3575
## month8          109.333    4.216 25.9352
## month9           60.667    4.416 13.7392
## month10          24.833    4.607  5.3906
## month11          -8.917    4.790 -1.8614
## month12          20.083    4.967  4.0435
## year_z           97.700   16.021  6.0984
## month2:year_z   -11.820    2.705 -4.3703
## month3:year_z    -1.414    2.941 -0.4806
## month4:year_z     3.942    3.117  1.2646
## month5:year_z    10.775    3.242  3.3239
## month6:year_z    29.054    3.321  8.7490
## month7:year_z    49.997    3.358 14.8892
## month8:year_z    48.425    3.354 14.4362
## month9:year_z    17.601    3.310  5.3174
## month10:year_z    4.436    3.223  1.3764
## month11:year_z  -10.872    3.090 -3.5185
## month12:year_z   -4.003    2.904 -1.3781
##  --------------- Random effects ---------------
## Family: gaussian ( link = identity ) 
##                    --- Correlation parameters:
##  1.ARphi 
## 0.999958 
##            --- Variance parameters ('lambda'):
## lambda = var(u) for u ~ Gaussian; 
##    time  :  247400  
##              --- Coefficients for log(lambda):
##  Group        Term Estimate Cond.SE
##   time (Intercept)    12.42   0.213
## # of obs: 144; # of groups: time, 144 
##  ------------- Residual variance  -------------
## Coefficients for log(phi) ~ 1 :
##             Estimate Cond. SE
## (Intercept)    3.526   0.1623
## Estimate of phi=residual var:  34 
##  ------------- Likelihood values  -------------
##                         logLik
## p_v(h) (marginal L): -505.3647
##   p_beta,v(h) (ReL): -455.5421
## lambda leverages numerically 1 were replaced by 1 - 1e-8

Fitted values

data.for.plot$pred_spaMM <- predict(mod_air_spaMM1)
plot(obs ~ time, data = data.for.plot, type = "l", lwd = 3, ylim = c(0, 700), ylab = "Passengers")
points(pred_spaMM ~ time, data = data.for.plot, type = "l", col = "green")

Note: never extrapolate using such model! The perfect fit is not unusual.

Testing the effect of years

spaMM

mod_air_spaMM2ML <- fitme(passengers ~ month*year_z + AR1(1|time), data = air, method = "ML")

mod_air_no_year <- fitme(passengers ~ month + AR1(1|time), data = air, method = "ML")
## (This warning will show only once)
##  Low (<1e-12) fitted residual variance (phi): this may be a genuine result
##  for data without appropriate replicates and a model that allows overfitting, but
##  (1) this may also point to problems in the data (duplicated response values?);
##  (2) this may crash later computations. You may overcome this by setting
##      control.HLfit$min_phi to 1e-10 or some other low, but not too low, value.
##      Still, the computed likelihood maximum wrt all parameters may be inaccurate.
## Warning in spaMM::HLfit_body(init.HLfit = list(), ranFix = structure(list(:
anova(mod_air_spaMM2ML, mod_air_no_year)
##      chi2_LR df p_value
## p_v 637.7874 12       0
c(logLik(mod_air_spaMM2ML), logLik(mod_air_no_year))
##       p_v       p_v 
## -499.8833 -818.7770

Testing the effect of years

nlme

mod_air3ML <- lme(passengers ~ month * year, random = ~ 1 | year, data = air,
                 correlation = corARMA(p = 6, q = 0), method = "ML")

mod_air_no_year2 <- lme(passengers ~ month, random = ~ 1 | year, data = air,
                 correlation = corARMA(p = 6, q = 0), method = "ML")
anova(mod_air3ML, mod_air_no_year2)
##                  Model df     AIC      BIC    logLik   Test  L.Ratio p-value
## mod_air3ML           1 32 1062.79 1157.824 -499.3950                        
## mod_air_no_year2     2 20 1258.97 1318.367 -609.4851 1 vs 2 220.1804  <.0001

Spatial autocorrelation

Maximum normalised-difference vegetation index in north Cameroon

data("Loaloa")
ndvi <- Loaloa[, c("maxNDVI", "latitude", "longitude")]
head(ndvi)
##    maxNDVI latitude longitude
## X1    0.69 5.736750  8.041860
## X2    0.74 5.680280  8.004330
## X3    0.79 5.347222  8.905556
## X4    0.67 5.917420  8.100720
## X5    0.85 5.104540  8.182510
## X6    0.80 5.355556  8.929167

Visualising the data

library(maps)
spaMMplot2D(x = ndvi$longitude, y = ndvi$latitude, z = ndvi$maxNDVI, add.map = TRUE,
            xlab = "Longitude", ylab = "Latitude", plot.title = title(main = "max NDVI"))

Visualising the data

pairs(ndvi)

Fitting the model

(mod_ndvi1 <- fitme(maxNDVI ~ 1 + Matern(1|longitude + latitude), data = ndvi, method = "REML"))
## formula: maxNDVI ~ 1 + Matern(1 | longitude + latitude)
## REML: Estimation of lambda, phi and corrPars by REML.
##       Estimation of fixed effects by ML.
## Family: gaussian ( link = identity ) 
##  ------------ Fixed effects (beta) ------------
##             Estimate Cond. SE t-value
## (Intercept)   0.8006  0.02391   33.48
##  --------------- Random effects ---------------
## Family: gaussian ( link = identity ) 
##                    --- Correlation parameters:
##      1.nu     1.rho 
## 0.4066726 0.9345780 
##            --- Variance parameters ('lambda'):
## lambda = var(u) for u ~ Gaussian; 
##    longitude.  :  0.004322  
##              --- Coefficients for log(lambda):
##       Group        Term Estimate Cond.SE
##  longitude. (Intercept)   -5.444  0.1229
## # of obs: 197; # of groups: longitude., 197 
##  ------------- Residual variance  -------------
## Coefficients for log(phi) ~ 1 :
##             Estimate Cond. SE
## (Intercept)   -8.219   0.1775
## Estimate of phi=residual var:  0.0002696 
##  ------------- Likelihood values  -------------
##                        logLik
## p_v(h) (marginal L): 378.1405
##   p_beta,v(h) (ReL): 375.3261

Predictions

mapMM(mod_ndvi1, add.map = TRUE, plot.title = title(xlab = "Longitude", ylab = "Latitude"))

Predictions

filled.mapMM(mod_ndvi1, add.map = TRUE, plot.title = title(xlab = "Longitude", ylab = "Latitude"))

Prediction uncertainty

x.for.pred <- seq(min(ndvi$longitude), max(ndvi$longitude), length.out = 100)
y.for.pred <- seq(min(ndvi$latitude), max(ndvi$latitude), length.out = 50)
data.for.pred <- expand.grid(longitude = x.for.pred, latitude = y.for.pred)
gridpred <- predict(mod_ndvi1, newdata = data.for.pred, variances = list(predVar = TRUE))
data.for.pred$predVar <- attr(gridpred, "predVar")
m <- matrix(data.for.pred$predVar, ncol = length(y.for.pred))

Prediction uncertainty

spaMM.filled.contour(x = x.for.pred, y = y.for.pred, z = m, plot.axes = {
  points(ndvi[, c("longitude", "latitude")])}, col = spaMM.colors(redshift = 3))

Non gaussian response

GLMM

GLM + LMM = GLMM

\[\begin{array}{lcl} \mu &=& g^{-1}(\eta)\\ \mu &=& g^{-1}(\mathbf{X}\beta + \mathbf{Z}b)\\ \end{array} \]

with (as for GLM):

  • \(\text{E}(\text{Y}) = \mu = g^{-1}(\eta)\)
  • \(\text{Var}(\text{Y}) = \phi\text{V}(\mu)\)


Note:

  • If \(g^{-1}\) is the identity function, \(\phi = \sigma^2\) and \(\text{V}(\mu) = 1\), we have the LMM.
  • If \(\mathbf{Z}b = 0\), we have the GLM.
  • If \(g^{-1}\) is the identity function, \(\phi = \sigma^2\), \(\text{V}(\mu) = 1\), and \(\mathbf{Z}b = 0\), we have the LM.

The LM2GLMM::Flatwork dataset

Flatwork
##    individual gender month shopping cleaning other absent
## 1           1      w   nov        4        8     3      0
## 2           2      w   nov       10        5    10      7
## 3           3      w   nov        3        1     2      4
## 4           4      w   nov       15        2     0      0
## 5           5      w   nov       15        9     4      0
## 6           6      m   nov        0        2     8      0
## 7           7      w   nov        5        6     6     10
## 8           8      m   nov        3        4     5     11
## 9           9      m   nov        5        7     6      4
## 10         10      m   nov        6        3     0      7
## 11          1      w   dec        7        6     3      1
## 12          2      w   dec        6        2     3      7
## 13          3      w   dec        5        7     1      5
## 14          4      w   dec       14        5     1      0
## 15          5      w   dec        5        5     1      0
## 16          6      m   dec        6        1     0      5
## 17          7      w   dec        3        1     0      1
## 18          8      m   dec        2        1     0      0
## 19          9      m   dec        4        0     0      5
## 20         10      m   dec        7        3     1      3
## 21          1      w   jan       11        3     3      1
## 22          2      w   jan        3        8     2      7
## 23          3      w   jan        7        7    12      4
## 24          4      w   jan        5        2     5      0
## 25          5      w   jan        6        6     0      0
## 26          6      m   jan       13        0     5      2
## 27          7      w   jan        2       16     6      6
## 28          8      m   jan        7        6     9      0
## 29          9      m   jan        0        4     2      5
## 30         10      m   jan        6        3     9      7
## 31          1      w   feb        8        3     1      7
## 32          2      w   feb        2        4     1      6
## 33          3      w   feb        1        7    16      1
## 34          4      w   feb        0        0     0      0
## 35          5      w   feb       12       10     6      0
## 36          6      m   feb        7        0     1      0
## 37          7      w   feb        3        2    10      6
## 38          8      m   feb        7        4     3      0
## 39          9      m   feb        0        0     3      7
## 40         10      m   feb       16        2     0      0
## 41          1      w   mar        0        0     0      0
## 42          2      w   mar       17        2     5     11
## 43          3      w   mar        0        6     3      3
## 44          4      w   mar       11        5     6     12
## 45          5      w   mar        8       15    12      0
## 46          6      m   mar       10        0    14      0
## 47          7      w   mar        6        5     6      7
## 48          8      m   mar        8        6     6      0
## 49          9      m   mar       11        0     0      8
## 50         10      m   mar       10        2     1      0
## 51          1      w   apr        4        2     0      0
## 52          2      w   apr        9        9    12      0
## 53          3      w   apr       19        3     2      3
## 54          4      w   apr       14        4     6      6
## 55          5      w   apr        3        1     5      0
## 56          6      m   apr        4        1     0      0
## 57          7      w   apr       10       13     8      0
## 58          8      m   apr        0        2     2     10
## 59          9      m   apr        1        6     0      0
## 60         10      m   apr        4        4     1      0

The LM2GLMM::Flatwork dataset

str(Flatwork)
## 'data.frame':    60 obs. of  7 variables:
##  $ individual: int  1 2 3 4 5 6 7 8 9 10 ...
##  $ gender    : Factor w/ 2 levels "m","w": 2 2 2 2 2 1 2 1 1 1 ...
##  $ month     : Factor w/ 6 levels "apr","dec","feb",..: 6 6 6 6 6 6 6 6 6 6 ...
##  $ shopping  : int  4 10 3 15 15 0 5 3 5 6 ...
##  $ cleaning  : int  8 5 1 2 9 2 6 4 7 3 ...
##  $ other     : int  3 10 2 0 4 8 6 5 6 0 ...
##  $ absent    : int  0 7 4 0 0 0 10 11 4 7 ...

GLMM with lme4

(mod_glmm_lme4 <- glmer(shopping ~ gender + (1|individual) + (1|month), family = poisson(),
                        data = Flatwork))
## Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
##  Family: poisson  ( log )
## Formula: shopping ~ gender + (1 | individual) + (1 | month)
##    Data: Flatwork
##       AIC       BIC    logLik  deviance  df.resid 
##  421.7239  430.1012 -206.8619  413.7239        56 
## Random effects:
##  Groups     Name        Std.Dev.
##  individual (Intercept) 0.2261  
##  month      (Intercept) 0.0510  
## Number of obs: 60, groups:  individual, 10; month, 6
## Fixed Effects:
## (Intercept)      genderw  
##      1.7107       0.2159

GLMM with spaMM

(mod_glmm_spaMM <- fitme(shopping ~ gender + (1|individual) + (1|month), family = poisson(),
                        data = Flatwork, method = "ML"))
## formula: shopping ~ gender + (1 | individual) + (1 | month)
## Estimation of corrPars and lambda by Laplace ML approximation (p_v).
## Estimation of fixed effects by Laplace ML approximation (p_v).
## Estimation of lambda by 'outer' ML, maximizing p_v.
## Family: poisson ( link = log ) 
##  ------------ Fixed effects (beta) ------------
##             Estimate Cond. SE t-value
## (Intercept)   1.7107   0.1440  11.879
## genderw       0.2159   0.1813   1.191
##  --------------- Random effects ---------------
## Family: gaussian ( link = identity ) 
##            --- Variance parameters ('lambda'):
## lambda = var(u) for u ~ Gaussian; 
##    individual  :  0.05113 
##    month  :  0.002602  
## # of obs: 60; # of groups: individual, 10; month, 6 
##  ------------- Likelihood values  -------------
##                         logLik
## p_v(h) (marginal L): -206.8619

Checking residuals

library(DHARMa)
r <- simulateResiduals(mod_glmm_lme4)
plot(r)
## DHARMa::plotResiduals - low number of unique predictor values, consider setting asFactor = T
## DHARMa::plotResiduals - low number of unique predictor values, consider setting asFactor = T

Extra 0s?

barplot(table(Flatwork$shopping))

Extra 0s?

testZeroInflation(r)

## 
##  DHARMa zero-inflation test via comparison to expected zeros with simulation under H0 = fitted model
## 
## data:  r
## ratioObsExp = 30.702, p-value < 2.2e-16
## alternative hypothesis: more

Binomial model

Flatwork$shopping_bin <- Flatwork$shopping > 0
(mod_glmm_lme4bin <- glmer(shopping_bin ~ gender + (1|individual) + (1|month), family = binomial(),
                        data = Flatwork))
## Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
##  Family: binomial  ( logit )
## Formula: shopping_bin ~ gender + (1 | individual) + (1 | month)
##    Data: Flatwork
##      AIC      BIC   logLik deviance df.resid 
##  50.2791  58.6565 -21.1396  42.2791       56 
## Random effects:
##  Groups     Name        Std.Dev. 
##  individual (Intercept) 8.887e-09
##  month      (Intercept) 4.127e-08
## Number of obs: 60, groups:  individual, 10; month, 6
## Fixed Effects:
## (Intercept)      genderw  
##      1.6094       0.7885

Checking residuals

r_bin <- simulateResiduals(mod_glmm_lme4bin)
plot(r_bin)
## DHARMa::plotResiduals - low number of unique predictor values, consider setting asFactor = T
## DHARMa::plotResiduals - low number of unique predictor values, consider setting asFactor = T

Overdispersion?

r_bin2 <- simulateResiduals(mod_glmm_lme4bin, refit = TRUE)  ## slow and convergence issues...
testOverdispersion(r_bin2)
## 
##  DHARMa nonparametric overdispersion test via comparison to simulation under H0 = fitted model
## 
## data:  r_bin2
## dispersion = 1.2008, p-value = 0.064
## alternative hypothesis: overdispersion

Testing the gender effect

mod_glmm_lme4bin0 <- glmer(shopping_bin ~ 1 + (1|individual) + (1|month), family = binomial(),
                        data = Flatwork)

anova(mod_glmm_lme4bin, mod_glmm_lme4bin0)
## Data: Flatwork
## Models:
## mod_glmm_lme4bin0: shopping_bin ~ 1 + (1 | individual) + (1 | month)
## mod_glmm_lme4bin: shopping_bin ~ gender + (1 | individual) + (1 | month)
##                   Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
## mod_glmm_lme4bin0  3 49.228 55.511 -21.614   43.228                         
## mod_glmm_lme4bin   4 50.279 58.657 -21.140   42.279 0.9485      1     0.3301

Same with spaMM

mod_glmm_spaMMbin <- fitme(shopping_bin ~ gender + (1|individual) + (1|month), family = binomial(),
                        data = Flatwork)
## Fits using Laplace approximation may diverge for all-or-none binomial data:
##  check PQL or PQL/L methods in that case.
mod_glmm_spaMMbin0 <- fitme(shopping_bin ~ 1 + (1|individual) + (1|month), family = binomial(),
                        data = Flatwork)

anova(mod_glmm_spaMMbin, mod_glmm_spaMMbin0)
##       chi2_LR df   p_value
## p_v 0.9485333  1 0.3300929

Is there an effect for the non-zeros?

This is not ideal, but we will try an analysis on the truncated distribution…

Flatwork_pos <- subset(Flatwork, Flatwork$shopping_bin)
barplot(table(Flatwork_pos$shopping))

Fitting models on truncated distributions

mod_glmm_lme4pos1 <- glmer(shopping ~ gender + (1|individual) + (1|month), family = poisson(),
                        data = Flatwork_pos)
mod_glmm_lme4pos2 <- glmer.nb(shopping ~ gender + (1|individual) + (1|month), data = Flatwork_pos)
mod_glmm_spaMMpos1 <- fitme(shopping ~ gender + (1|individual) + (1|month), family = poisson(),
                        data = Flatwork_pos)
mod_glmm_spaMMpos2 <- fitme(shopping ~ gender + (1|individual) + (1|month), family = spaMM::negbin(),
                        data = Flatwork_pos)
mod_glmm_spaMMpos3 <- fitme(shopping ~ gender + (1|individual) + (1|month), family = COMPoisson(),
                        data = Flatwork_pos)

c(AIC(mod_glmm_lme4pos1), AIC(mod_glmm_lme4pos2))
## [1] 328.9853 306.3538
print(rbind(AIC(mod_glmm_spaMMpos1), AIC(mod_glmm_spaMMpos2), AIC(mod_glmm_spaMMpos3)))
##             marginal AIC:     conditional AIC:      dispersion AIC:        effective df:
## [1,]             328.9853             318.2144             324.9853             43.82578
## [2,]             306.3538             302.3538             302.3538             50.99975
## [3,]             359.0022             347.6116             355.0022             41.75854

Checking residuals

r_pos <- simulateResiduals(mod_glmm_lme4pos2)
plot(r_pos)
## DHARMa::plotResiduals - low number of unique predictor values, consider setting asFactor = T
## DHARMa::plotResiduals - low number of unique predictor values, consider setting asFactor = T

Testing again the gender effect

mod_glmm_spaMMpos20 <- fitme(shopping ~ 1 + (1|individual) + (1|month), family = spaMM::negbin(),
                        data = Flatwork_pos)

anova(mod_glmm_spaMMpos20, mod_glmm_spaMMpos2)
##       chi2_LR df   p_value
## p_v 0.4453586  1 0.5045474

Practice


What about gender and cleaning?

Non gaussian random effects

The Hierarchical GLM (HGLM)

\[\begin{array}{lcl} \mu &=& g^{-1}(\eta)\\ \mu &=& g^{-1}(\mathbf{X}\beta + \mathbf{Z}b)\\ \mu &=& g^{-1}(\mathbf{X}\beta + \mathbf{Z}f(u)) \end{array} \]


Note:

  • If \(f(u)\) is the identity function and \(u\) is drawn for a normal distribution, then we have the GLMM, a particular case of the more general HGLM.
  • Hence LM, GLM, LMM and GLMM are all particular cases of the HGLM.

The example of the negative binomial

library(MASS)

mod_negbin <- glm.nb(Days ~ Sex/Age, data = quine)

quine$index <- factor(1:nrow(quine))

mod_poiss_gamma <- fitme(Days ~ Sex/Age + (1|index), data = quine,
                         family = poisson(), rand.family = Gamma("log"))
## Warning in .check_conv_glm_reinit(): spaMM_glm.fit() did not always converge (criterion: max = 0.101, latest = 0.101 )
rbind(mod_negbin$coefficients, mod_poiss_gamma$fixef)
##      (Intercept)       SexM SexF:AgeF1 SexM:AgeF1  SexF:AgeF2 SexM:AgeF2 SexF:AgeF3 SexM:AgeF3
## [1,]    2.928524 -0.3957609 -0.3659809 -0.5868525 -0.01502935  0.6211936 -0.2894662  0.7709794
## [2,]    2.928524 -0.3957609 -0.3659809 -0.5868525 -0.01502935  0.6211936 -0.2894662  0.7709794


Note: the equivalence is expected! In more complex models differences may appear.

The beta-binomial HGLM


\[ \begin{array}{lcl} \text{logit}(p) &=& \text{ln}\left(\frac{p}{1-p}\right) = \mathbf{X}\beta + \mathbf{Z}b\\ \text{logit}(b) &=& \text{ln}\left(\frac{u}{1-u}\right) \end{array} \]


with \(u\) following the beta distribution.


It can be useful to model heterogeneity in proportions!

The spaMM::seeds dataset

data(seeds)
seeds
##    plate seed  extract  r  n
## 1      1  O75     Bean 10 39
## 2      2  O75     Bean 23 62
## 3      3  O75     Bean 23 81
## 4      4  O75     Bean 26 51
## 5      5  O75     Bean 17 39
## 6      6  O73     Bean  8 16
## 7      7  O73     Bean 10 30
## 8      8  O73     Bean  8 28
## 9      9  O73     Bean 23 45
## 10    10  O73     Bean  0  4
## 11    11  O75 Cucumber  5  6
## 12    12  O75 Cucumber 53 74
## 13    13  O75 Cucumber 55 72
## 14    14  O75 Cucumber 32 51
## 15    15  O75 Cucumber 46 79
## 16    16  O75 Cucumber 10 13
## 17    17  O73 Cucumber  3 12
## 18    18  O73 Cucumber 22 41
## 19    19  O73 Cucumber 15 30
## 20    20  O73 Cucumber 32 51
## 21    21  O73 Cucumber  3  7

The spaMM::seeds dataset

coplot(r/n ~ plate | seed + extract, data = seeds)

Fitting the germination data using the HGLM

(mod_germ1 <- fitme(cbind(r, n - r) ~ seed * extract + (1|plate), family = binomial(), rand.family = Beta(),
                   data = seeds, method = "REML"))
## formula: cbind(r, n - r) ~ seed * extract + (1 | plate)
## Estimation of lambda by Laplace REML approximation (p_bv).
## Estimation of fixed effects by Laplace ML approximation (p_v).
## Family: binomial ( link = logit ) 
##  ------------ Fixed effects (beta) ------------
##                         Estimate Cond. SE t-value
## (Intercept)              -0.5483   0.1896 -2.8922
## seedO73                   0.0768   0.3070  0.2502
## extractCucumber           1.3526   0.2687  5.0330
## seedO73:extractCucumber  -0.8313   0.4279 -1.9429
##  --------------- Random effects ---------------
## Family: Beta ( link = logit ) 
##            --- Variance parameters ('lambda'):
## lambda = 4 var(u)/(1 - 4 var(u)) for u ~ Beta[1/(2*lambda),1/(2*lambda)]; 
##    plate  :  0.02378  
##              --- Coefficients for log(lambda):
##  Group        Term Estimate Cond.SE
##  plate (Intercept)   -3.739  0.5364
## # of obs: 21; # of groups: plate, 21 
##  ------------- Likelihood values  -------------
##                         logLik
## p_v(h) (marginal L): -54.05829
##   p_beta,v(h) (ReL): -56.59763

Comparison to the binomial GLMM

mod_germ2 <- fitme(cbind(r, n - r) ~ seed * extract + (1|plate), family = binomial(),
                   data = seeds, method = "REML")

print(rbind(AIC(mod_germ1), AIC(mod_germ2)))
##             marginal AIC:     conditional AIC:      dispersion AIC:        effective df:
## [1,]             118.1166             110.5305             115.1953            10.116005
## [2,]             118.0763             110.5009             115.0612             9.939376


Ok… here it does not do much difference, but it is still worth trying.

Heteroscedasticity

Let's revisit the rats

mod_rat_spaMM <- fitme(weight ~ Diet * Time + (Time|Rat) + AR1(1|Time), data = body,
                            method = "REML")
coplot(residuals(mod_rat_spaMM) ~ I(1:nrow(body)) | body$Diet, show.given = FALSE)

Let's revisit the rats

mod_rat_hetero <- fitme(weight ~ Diet * Time + (Time|Rat) + AR1(1|Time), data = body,
                        method = "REML", resid.formula = ~ Diet)
summary.tables <- summary(mod_rat_hetero)
summary.tables$phi_table
##             Estimate  Cond. SE
## (Intercept) 2.826213 0.1199502
print(rbind(AIC(mod_rat_spaMM),
            AIC(mod_rat_hetero)))
##             marginal AIC:     conditional AIC:      dispersion AIC:        effective df:
## [1,]             1174.127             1035.876             1147.114             139.0042
## [2,]             1174.127             1035.876             1147.114             139.0042

Let's re-test the overal effect of the diet

mod_rat_hetero <- fitme(weight ~ Diet * Time + (Time|Rat) + AR1(1|Time), data = body,
                           HLmethod = "ML", resid.formula = ~ Diet)
## 'HLmethod' argument for fitme() may become obsolete: use 'method' instead.
mod_rat_hetero0 <- fitme(weight ~ Time + (Time|Rat) + AR1(1|Time), data = body,
                           method = "ML", resid.formula = ~ Diet)

anova(mod_rat_hetero, mod_rat_hetero0)
##      chi2_LR df      p_value
## p_v 47.77998  4 1.048901e-09

You can handle heteroscedasticity in simple models too!

set.seed(1L)
d <- data.frame(y = c(rnorm(100, mean = 10, sd = sqrt(10)),
                      rnorm(100, mean = 20, sd = sqrt(20))),
                group = factor(c(rep("A", 100), rep("B", 100))))

m <- fitme(y ~ group, resid.model = ~ group, data = d, method = "REML")
unique(m$phi)
## [1]  8.067621 18.350646

An example of many difficulties combined: IsoriX

What is IsoriX?

library(IsoriX)
## 
##  IsoriX version 0.7.1 is loaded!
## 
##  The names of the functions and objects are not yet stable.
##  We keep revising them to make IsoriX more intuitive for you to use.
##  We will do our best to limit changes in names from version 1.0 onward!!
## 
##  Type:
##     * ?IsoriX for a short description.
##     * browseVignettes(package = 'IsoriX') for tutorials.
##     * news(package = 'IsoriX') for news.

Loading the GNIPDataDE data

dim(GNIPDataDE)
## [1] 8591    7
head(GNIPDataDE)
##   stationID   lat long elev year month isoscape.value
## 1 SCHLESWIG 54.52 9.54   43 1997     6          -59.0
## 2 SCHLESWIG 54.52 9.54   43 1997     7          -56.0
## 3 SCHLESWIG 54.52 9.54   43 1997     8          -60.8
## 4 SCHLESWIG 54.52 9.54   43 1997     9          -51.0
## 5 SCHLESWIG 54.52 9.54   43 1997    10          -58.7
## 6 SCHLESWIG 54.52 9.54   43 1997    11          -74.6

Aggregate the GNIPDataDE

GNIPDataDE_agg <- prepdata(data = GNIPDataDE)
dim(GNIPDataDE_agg)
## [1] 27  7
head(GNIPDataDE_agg)
##       stationID isoscape.value var.isoscape.value n.isoscape.value   lat  long elev
## 1        ARKONA      -60.99231           247.8767              134 54.67 13.43   42
## 2        ARTERN      -61.00653           510.5720              199 51.37 11.29  164
## 3 BAD SALZUFLEN      -53.80770           310.1046              408 52.10  8.75  135
## 4        BERLIN      -57.53477           395.6484              419 52.46 13.40   48
## 5  BRAUNSCHWEIG      -52.73350           339.4752              420 52.29 10.44   81
## 6      CUXHAVEN      -49.25271           221.6594              420 53.87  8.70    5

IsoriX using IsoriX

(Europefit <- isofit(iso.data = GNIPDataDE_agg, mean.model.fix = list(elev = TRUE, lat.abs = TRUE)))
## 
## ### spaMM summary of the fit of the mean model ### 
## 
## formula: isoscape.value ~ 1 + elev + lat.abs + (1 | stationID) + Matern(1 | 
##     long + lat)
## REML: Estimation of lambda, phi and corrPars by REML.
##       Estimation of fixed effects by ML.
## Family: gaussian ( link = identity ) 
##  ------------ Fixed effects (beta) ------------
##               Estimate  Cond. SE t-value
## (Intercept) -145.78873 65.386537  -2.230
## elev          -0.01163  0.002449  -4.750
## lat.abs        1.68318  1.210013   1.391
##  --------------- Random effects ---------------
## Family: gaussian ( link = identity ) 
## Distance(s): Earth 
##                    --- Correlation parameters:
##         2.nu        2.rho 
## 0.5000000000 0.0001176491 
##            --- Variance parameters ('lambda'):
## lambda = var(u) for u ~ Gaussian; 
##    stationID  :  6.993e-10 
##    long + lat  :  482  
##              --- Coefficients for log(lambda):
##       Group        Term Estimate   Cond.SE
##   stationID (Intercept)  -21.081 2721.6553
##  long + lat (Intercept)    6.178    0.3232
## # of obs: 27; # of groups: stationID, 27; long + lat, 27 
##  ------------- Residual variance  -------------
## Prior weights: 134 199 408 419 420 ...
## phi was fixed by an offset term:  "phi" ~ 0 + offset(pred.disp) 
##  ------------- Likelihood values  -------------
##                         logLik
## p_v(h) (marginal L): -71.72788
##   p_beta,v(h) (ReL): -71.74299
## lambda leverages numerically 1 were replaced by 1 - 1e-8 
## $family
## 
## Family: gaussian 
## Link function: identity 
## 
## 
## $beta_table
##                  Estimate    Cond. SE   t-value
## (Intercept) -145.78873189 65.38653670 -2.229644
## elev          -0.01163181  0.00244856 -4.750470
## lat.abs        1.68317611  1.21001275  1.391040
## 
## $lambda_table
##                 Group        Term  Estimate      Cond.SE
## stationID   stationID (Intercept) -21.08087 2721.6552629
## long + lat long + lat (Intercept)   6.17794    0.3232431
## 
## $likelihoods
## p_v(h) (marginal L):   p_beta,v(h) (ReL): 
##            -71.72788            -71.74299 
## 
## 
## 
## ### spaMM summary of the fit of the residual dispersion model ### 
## 
## formula: var.isoscape.value ~ 1 + (1 | stationID) + Matern(1 | long + 
##     lat)
## Estimation of corrPars and lambda by Laplace REML approximation (p_bv).
## Estimation of fixed effects by Laplace ML approximation (p_v).
## Estimation of lambda by 'outer' REML, maximizing p_bv.
## Family: Gamma ( link = log ) 
##  ------------ Fixed effects (beta) ------------
##             Estimate Cond. SE t-value
## (Intercept)    5.902    1.186   4.976
##  --------------- Random effects ---------------
## Family: gaussian ( link = identity ) 
## Distance(s): Earth 
##                    --- Correlation parameters:
##         2.nu        2.rho 
## 0.5000000000 0.0001176491 
##            --- Variance parameters ('lambda'):
## lambda = var(u) for u ~ Gaussian; 
##    stationID  :  0.00736 
##    long + lat  :  1.491  
## # of obs: 27; # of groups: stationID, 27; long + lat, 27 
##  -- Residual variation ( var = phi * mu^2 )  --
## Prior weights: 133 198 407 418 419 ...
## phi was fixed to 2 
##  ------------- Likelihood values  -------------
##                         logLik
## p_v(h) (marginal L): -160.5648
##   p_beta,v(h) (ReL): -159.4753
## $family
## 
## Family: Gamma 
## Link function: log 
## 
## 
## $beta_table
##             Estimate Cond. SE  t-value
## (Intercept) 5.901541 1.185956 4.976189
## 
## $lambda_table
## [1] Group Term 
## <0 rows> (or 0-length row.names)
## 
## $phi_outer
## [1] 2
## attr(,"type")
## [1] "fix"
## 
## $likelihoods
## p_v(h) (marginal L):   p_beta,v(h) (ReL): 
##            -160.5648            -159.4753 
## 
## 
## [models fitted with spaMM version 2.4.8] 
## 
## NULL

The data for the elevation

library(rasterVis)
plot(ElevRasterDE)

IsoriX using IsoriX

isoscape <- isoscape(elevation.raster = ElevRasterDE, isofit = Europefit)

plot.mean <- plot(x = isoscape, which = "mean", plot = FALSE)

plot.disp <- plot(x = isoscape, which = "disp", plot = FALSE)

Mean prediction of the distribution of Deuterium

plot.mean

Prediction of the residual variance in Deuterium

plot.disp

IsoriX using spaMM

disp.fit <- fitme(var.isoscape.value ~ 1 + Matern(1 | long + lat), family = Gamma(log),
                  prior.weights = n.isoscape.value - 1, method = "REML", fixed = list(phi = 2),
                  control.dist = list(dist.method = "Earth"), data = GNIPDataDE_agg)
GNIPDataDE_agg$pred.disp <- predict(disp.fit)[, 1]
mean.fit <- fitme(isoscape.value ~ lat + elev + Matern(1 | long + lat), family = gaussian(), 
                  prior.weights = n.isoscape.value, method = "REML",
                  resid.model = list(formula = ~ 0 + offset(pred.disp), family = Gamma(identity)),
                  control.dist = list(dist.method = "Earth"), data = GNIPDataDE_agg)
Europefit2 <- list(mean.fit = mean.fit, disp.fit = disp.fit)

Predictions

isoscape2 <- isoscape(elevation.raster = ElevRasterDE, isofit = Europefit2)
plot(x = isoscape2, which = "mean", plot = TRUE)

DHGLM

system.time(
  dhglm <- corrHLfit(isoscape.value ~ lat + elev + Matern(1 | long + lat), family = gaussian(), 
              HLmethod = "REML", data = GNIPDataDE, control.dist = list(dist.method = "Earth"),
              resid.model = list(formula = ~ 1 + Matern(1 | long + lat),
                                 control.dist = list(dist.method = "Earth"),
                                 family = Gamma(log), fixed = list(phi = 2)),
                                 verbose = c(TRACE = TRUE))
)

What you need to remember

  • how to handle temporal and spatial autocorrelation
  • that GLMM are not very difficult if you already know GLM and LMM
  • that random effects as well can have non Gaussian distribution
  • that there are even more general methods than GLMM out there: HGLM and DHGLM
  • how to handle heteroscedasticity

Table of contents

Mixed-effects models